Welcome to the 2025 Introduction to R Canadian Bioinformatics Workshop webpage!
Principal Scientist and Adjunct Professor
Vaccine and Infectious Disease Organization (VIDO), University of Saskatchewan
Saskatoon, Saskatchewan, Canada
Mohamed is a Computational Systems Biologist and Principal Scientist leading the Bioinformatics and Systems Biology Lab (BSBL) at the Vaccine and Infectious Disease Organization (VIDO), University of Saskatchewan. He received his MSc and PhD in Computational Systems Biology from Keio University (Tokyo, Japan) and completed his postdoctoral training in bioinformatics at Kyoto University and the University of Toronto. Mohamed’s interdisciplinary research profile bridges biology, computer science, and public health.
Graduate student
Vaccine and Infectious Disease Organization (VIDO), University of Saskatchewan
Saskatoon, Saskatchewan, Canada
Sylvia is a Computer science MSc student at the University of Saskatchewan, supervised by Dr. Helmy. She holds dual BSc degrees in Bioinformatics and Computer science. Currently her work focuses on bacterial genomic data.
Data and Compute Setup
Coming soon!
Coming soon!
Create 2 numeric variables and assign values for each
x = 10
y = 6
Calculate the sum of them
total = x + y
total
## [1] 16
Calculate the square root of the total
sr = sqrt(total)
sr
## [1] 4
Vector
v <- c(1,2,3,4)
v
## [1] 1 2 3 4
Matrix
m <- matrix(1:6, nrow = 2)
m
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
Dataframe
df <- data.frame(age=c(25,30), name=c("Mo","Tom"), group=c("A", "B"))
df
## age name group
## 1 25 Mo A
## 2 30 Tom B
List
lst <- list(numbers=v, info=df)
lst
## $numbers
## [1] 1 2 3 4
##
## $info
## age name group
## 1 25 Mo A
## 2 30 Tom B
install.packages("BiocManager")
BiocManager::install("ALL")
library(BiocManager)
library(ALL)
data(ALL)
df2 <- pData(ALL)
#summary(pData(ALL)[, c("age", "sex", "BT", "relapse")])
summary(df2[, c("age", "sex", "BT", "relapse")])
## age sex BT relapse
## Min. : 5.00 F :42 B2 :36 Mode :logical
## 1st Qu.:19.00 M :83 B3 :23 FALSE:35
## Median :29.00 NA's: 3 B1 :19 TRUE :65
## Mean :32.37 T2 :15 NA's :28
## 3rd Qu.:45.50 B4 :12
## Max. :58.00 T3 :10
## NA's :5 (Other):13
dim(df2)
## [1] 128 21
str(df2)
## 'data.frame': 128 obs. of 21 variables:
## $ cod : chr "1005" "1010" "3002" "4006" ...
## $ diagnosis : chr "5/21/1997" "3/29/2000" "6/24/1998" "7/17/1997" ...
## $ sex : Factor w/ 2 levels "F","M": 2 2 1 2 2 2 1 2 2 2 ...
## $ age : int 53 19 52 38 57 17 18 16 15 40 ...
## $ BT : Factor w/ 10 levels "B","B1","B2",..: 3 3 5 2 3 2 2 2 3 3 ...
## $ remission : Factor w/ 2 levels "CR","REF": 1 1 1 1 1 1 1 1 1 1 ...
## $ CR : chr "CR" "CR" "CR" "CR" ...
## $ date.cr : chr "8/6/1997" "6/27/2000" "8/17/1998" "9/8/1997" ...
## $ t(4;11) : logi FALSE FALSE NA TRUE FALSE FALSE ...
## $ t(9;22) : logi TRUE FALSE NA FALSE FALSE FALSE ...
## $ cyto.normal : logi FALSE FALSE NA FALSE FALSE FALSE ...
## $ citog : chr "t(9;22)" "simple alt." NA "t(4;11)" ...
## $ mol.biol : Factor w/ 6 levels "ALL1/AF4","BCR/ABL",..: 2 4 2 1 4 4 4 4 4 2 ...
## $ fusion protein: Factor w/ 3 levels "p190","p190/p210",..: 3 NA 1 NA NA NA NA NA NA 1 ...
## $ mdr : Factor w/ 2 levels "NEG","POS": 1 2 1 1 1 1 2 1 1 1 ...
## $ kinet : Factor w/ 2 levels "dyploid","hyperd.": 1 1 1 1 1 2 2 1 1 NA ...
## $ ccr : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ relapse : logi FALSE TRUE TRUE TRUE TRUE TRUE ...
## $ transplant : logi TRUE FALSE FALSE FALSE FALSE FALSE ...
## $ f.u : chr "BMT / DEATH IN CR" "REL" "REL" "REL" ...
## $ date last seen: chr NA "8/28/2000" "10/15/1999" "1/23/1998" ...
af <- table(df2$age)
barplot(af, main = "Age frec.")
### mean and median age
mn <- mean(df2$age) # this will return NA
md <- median(df2$age) # this will return NA
mn <- mean(df2$age, na.rm = TRUE) # this will work
md <- median(df2$age, na.rm = TRUE) # this will work
std <- sd(df2$age, na.rm = TRUE)
vr <- var(df2$age, na.rm = TRUE)
mxx <- max(df2$age, na.rm = T)
mnn <- min(df2$age, na.rm = T)
age_dit <- table(df2$age)
summary(df2[, c("age", "sex", "BT", "relapse")])
## age sex BT relapse
## Min. : 5.00 F :42 B2 :36 Mode :logical
## 1st Qu.:19.00 M :83 B3 :23 FALSE:35
## Median :29.00 NA's: 3 B1 :19 TRUE :65
## Mean :32.37 T2 :15 NA's :28
## 3rd Qu.:45.50 B4 :12
## Max. :58.00 T3 :10
## NA's :5 (Other):13